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Summary 

We develop algorithms for performing semiparametric regression analysis in real time, 
with data processed as it is collected and made immediately available via modern telecom- 
munications technologies. Our definition of semiparametric regression is quite broad and 
includes, as special cases, generalized linear mixed models, generalized additive models, 
geostatistical models, wavelet nonparametric regression models and their various combi- 
nations. Fast updating of regression fits is achieved by couching semiparametric regres- 
sion into a Bayesian hierarchical model or, equivalently graphical model framework and 
^ employing online mean field variational ideas. Internet sites attached to this article illus- 

trate the methodology for continually arriving stock market and real estate data. Flexible 
real-time analyses, based on increasingly ubiquitous streaming data sources stand to ben- 
efit. 

Keywords: Approximate Bayesian inference; Generalized additive models; Mean field vari- 
ational Bayes; Mixed models; Online variational Bayes; Penalized splines; Wavelets. 
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1 Introduction 

Ongoing technological advancements mean that data are being collected and made avail- 
able for inference with rapidly increasing volume and speed. There are numerous ex- 
amples of this explosion of data, but two that have established connections with semi- 
parametric regression, our focus in this article, are Internet auction analysis (e.g. Jank & 
Shmueli, 2007) and real-time spatial epidemiology (e.g. Kaimi & Diggle, 2011). 

Semiparametric regression refers to a large class of regression models that provide for 
non-linear predictor effects using spline and wavelet basis functions, as well as depen- 
dencies arising in grouped data such as within-subject correlation. An arsenal of both 
frequentist and Bayesian fitting and inference procedures now exist. Recent overviews are 
contained in Ruppert, Wand & Carroll (2009) and Wand & Ormerod (2011). 

Virtually all semiparametric regression methodology proposed to date assume that the 
data are processed in batch. That is, each iteration of an iterative inference procedure re- 
quires statistics computed across the entire data set. Summaries such as function estimates, 
confidence intervals and posterior density functions are then outputted. Downsides to 
batch processing include the requirement that statistical analysis wait until an entire data 
set has been assembled and, sometimes, the necessity of storing the entire data set in mem- 
ory. In the online case, each iteration of the procedure requires only knowledge of the latest 
data point and any summary statistics from previous iterations. A particular advantage of 
online processing is that summaries, such as those just mentioned, are updated through- 
out the data collection process and therefore are available immediately upon demand. 
Online processing also has the advantage of not requiring storage of potentially very large 
data-sets. 

While a number of batch procedures exist for performing semiparametric regression, 
we focus on a particular methodology here due to the ease of adapting it to the online 
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framework as well as its wide range of applicability. Consider single predictor nonpara- 
metric regression, a special case of semiparametric regression with a long history and large 
literature. Fully automatic nonparametric regression batch procedures include: (a) local 
linear kernel smoother with cross-validation bandwidth selection, (b) local linear kernel 
smoother with direct plug-in bandwidth selection, (c) frequentist low-rank smoothing 
spline with restricted maximum likelihood smoothing parameter selection, (d) Bayesian 
low-rank smoothing spline with Markov chain Monte Carlo approximate inference and 
(e) Bayesian low-rank smoothing spline with mean field variational Bayesian (MFVB) ap- 
proximate inference. Details of (a) are in Hardle (1990), details of (b) are in Wand & Jones 
(1995), whilst (c) and (d) are described in Ruppert, Wand & Carroll (2003). Section 2.7 
of Wand & Ormerod (2011) explains (e). Approaches (a)-(d) are more established, but 
none have a viable online modification. However, (e) is relatively easy to modify for this 
purpose. 

Another advantage of the Bayesian low-rank smoothing spline approach to nonpara- 
metric regression is its extendibility. As explained in Wand (2009), couching semiparamet- 
ric regression in a graphical models framework permits arbitrarily sophisticated models to 
be handled elegantly, efficiently, and cohesively. This approach can handle generalized ad- 
ditive models, geostatistical models, wavelet nonparametric regression models and their 
various combinations, as well complications such as outliers and missingness. Inference 
in these models is often accomplished by applying Markov chain Monte Carlo procedures 
using the directed acyclic graph of variable dependencies. While versatile and accurate, 
such inference procedures can be unacceptably slow. MFVB approaches, as demonstrated 
in Faes, Ormerod & Wand (2011) and Wand & Ormerod (2011), are a much faster alterna- 
tive. Some accuracy and versatility must be sacrificed in return for the increased speed 
of MFVB. Nonetheless, for the models treated in this article MFVB accuracy ranges from 
good to excellent. 

Iterative algorithms that make a single pass through the data, with one iteration per 
data point, have recently been developed for variational Bayesian inference. In the ma- 
chine learning literature, Hoffman, Blei & Bach (2010) introduced such an MFVB algorithm 
for latent Dirichlet allocation and applied their algorithm to topic modeling. This proce- 
dure was extended to the hierarchical Dirichlet process by Wang, Paisley & Blei (2011). 
Tchumtchoua, Dunson & Morris (2012) further developed online MFVB approximate in- 
ference for high-dimensional correlated data. These articles are referred to as online mean 
field variational Bayes or often with the shorter name online variational Bayes. While they are 
indeed single-pass and require storing only one data point in memory, they do, however, 
require knowledge of the number of data points from the start of the algorithm. Our focus 
in this work, by contrast, is not on transforming MFVB algorithms that require multiple 
data passes into single-pass algorithms. Rather, we are, in some sense, pursuing a more 
classical definition of an "online algorithm" in that each iteration of our procedure uses 
past data only in the form of sufficient statistics and future data not at all. 

Online MFVB has not been entertained previously for nonparametric and semipara- 
metric regression. But there is an old and large literature involving other online ap- 
proaches. For the nonparametric regression and the related density estimation problem 
Wolverton & Wagner (1967), Yamato (1971), Devroye & Wagner (1980) and Krzyak & 
Pawlak (1984) are examples of early articles on online analysis using kernel estimators. 
However, they are chiefly concerned with theoretical properties of the estimators and are 
devoid of practical automatic smoothing parameter selection strategies. 

Outside of semiparametric regression there are also large literatures on online analysis. 
A few recent examples are: Ng, McLachlan & Lee (2006) on prediction of hospital resource 
utilization, Fricker & Chang (2008) on biosurveillance and Kaimi & Diggle (2011) on mon- 
itoring of variation in risk of infections. A very recent article by Michalak et al. (2012) 
describes the development of systems for real-time streaming analysis. 

Semiparametric regression is a highly visual branch of Statistics, with graphics being 



2 



a crucial means of conveying and diagnosing regression fits. The norm for such graphical 
display are ink drawings on pieces of paper or figures in PDF file. Real-time semipara- 
metric regression represents a paradigm shift in graphical display, where regression sum- 
maries are best thought of as dynamic graphics on web-pages or iDevice apps. We have 
organized an Internet site that illustrates real-time semiparametric regression graphical 
display. 

Section [2] introduces the notion of real-time semiparametric regression based online 
MFVB via increasingly more sophisticated Gaussian response models. Both classical and 
sparse shrinkage are treated. The more challenging binary response case is dealt with in 
Section [3] Dynamic web-pages that illustrate the new methodology on live data are the 
focus of Section |U 

2 Gaussian Response Models 

The conversion of a batch MFVB semiparametric regression procedure to one that does on- 
line processing is particularly straightforward in the Gaussian response case. We start by 
explaining such conversion for the multiple linear regression model, since it has minimal 
notational overhead. 

2.1 Multiple linear regression 

Let X be a p x n design matrix and consider the Bayesian regression model 

y\(3,a ~ N{X(3,<r 2 I), f3 ~ JV(0, oj I), a ~ Half-Cauchy(^). (1) 

An equivalent, but more tractable model, is that where a ~ Half-Cauchy(^4) is replaced 
by the auxiliary variable representation 

a 2 1 a ~ Inverse-Gamma ( | , 1 /a) , a ~ Inverse-Gamma ( \ , 1 / A 2 ) . (2) 

Figure [T] displays the directed acyclic graph corresponding to the model conveyed by |T]) 
and @. 
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Figure 1: Directed acyclic graph for the model conveyed by Q and Q. The shading corresponds 
to the observed data. 

If the joint posterior density function p((3, a 2 , a\y) is restricted to having the product 
form 

q((3,a)q(a 2 ) (3) 

for density functions q(f3, a) and q{cr 2 ) then the optimal density functions in terms of min- 
imum Kullback-Leibler distance can be found from the batch MFVB algorithm, Algorithm 
[l] The optimal density functions are denoted by q*(j3,a) and q*(a 2 ). The MFVB solu- 
tion is also such that q*((3,a) = q*(/3) q*{a) even though <|3j does not assume this. The 
approximate marginal likelihood induced by product restriction d3| is denoted by p(y; q) 
and is a lower bound on the exact marginal likelihood p(y). See, for example, Section 2.2 
of Ormerod & Wand (2010) for the underpinning of Algorithm [l] General references on 
MFVB include Bishop (2006) and Wainwright & Jordan (2008). 
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Algorithm 1 Batch mean field variational Bayes algorithm for approximate inference in the Gaus- 
sian response linear regression model Q and |2]). 

Initialize: li q (\/ a 2) > 0. 

Read in y (n x 1) and I(nx p). 

Cycle: 

S g(/3) «- |^g(l/(7 2 ) + CT^ 2 

/*9(/3) ^" M g (l/<r 2 ) ^g(/3) -^ T l/ / Mg(l/a) <~ ViZ^l/a- 2 ) + A~ 2 } 

n + 1 

^ (1/CT2) ^ 2 ^ (1/a) + y^, - 2^ q{f3) X T y + tr[(X T X){£, (/3) + /^/x^}] 
until the increase in p(y ; q) is negligible. 
Produce summaries based on q*((3) ~ N(fi q ^,'E q ^) and 
g*(<r 2 ) ~ Inverse-Gamma^ (ra + 1), (n + l)/{2/i ? ( 1 / (T 2)}). 



The approximate marginal log-likelihood, used to monitor convergence in Algorithm 
[lj has explicit expression 

log p(y;q) = ip-inlog(27r)-21og(7r) + logr(i(n+l)) 

-\p log(oj) - log(A) - ^|{||^ (/3) || 2 +tr(S g(/3) )} 

+|log|E 9( ^| - i(n + l)log[(n + l)/{2 / u 9(1/(T 2 ) }] 

- log(Mg(l/ CT 2) + A~ 2 ) + M g (l/ CT 2)/Z 9 (l/ a ). 

In Algorithm [lj dependence on the data is only through the quantities y T y, X T y and 
X T X and each of these have simple updates when a new response y new and its corre- 
sponding p x 1 vector of predictors x nevj arrives. For example, the new X T X matrix is 

new new ^ v ^ v I •^new J ^ new - 

Based on these observations Algorithm |2j the online modification of the Algorithm [lj en- 
sues. 

Algorithm [2] differs from Algorithm [T] in that the data are processed on arrival and 
the approximate posterior densities of the model parameters are continually updated. In 
the case of streaming data there is the option of dynamic graphical displays of the ap- 
proximate posterior density functions of the regression coefficients and error variance and 
corresponding approximate Bayes estimates and credible sets. Dynamic regression diag- 
nostic plots could also be entertained. 

Figure [2] provides rudimentary illustration of online regression inference when data 
from the Vietnam World Bank Living Standards Survey (source: Cameron & Trivedi, 2005) 
is fed into Algorithm [2] These data are in the VietNamI data-frame of the R package 
Ecdat (Croissant, 2011). The response variable is the logarithm of total medical expenses. 
Description of the predictor variables is given in the VietNamI documentation of Cros- 
saint (2011). Each variable was transformed to lie inside the unit interval before being 
processed. Note that the scaling was determined using an initialization batch just as for 
the initial parameter tuning; therefore, the single-pass aspect of the algorithm is main- 
tained. The posterior density functions were then back-transformed to correspond to the 
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Algorithm 2 Online mean field variational Bayes algorithm for approximate inference in the Gaus- 
sian response linear regression model Q and 

Initialize: Mq(i/ CT 2) > 0, y T y <- 0, X T y <- pxl/ X T X <- pxp , n ^ 0. 

Cycle: 

read in y new (1 x 1) and x new (p x 1) ; n «— n + 1 

y T y <- y T y + y 2 w ; X T y <- X T y + a; new y new ; X T X <- X T X + a? new z:J ew 

S g(/3) <~ |^g(l/(7 2 ) + CX^ 2 /j 

Mg(/3) «~ Mq(l/<r 2 ) ^ T 2/ J Mg(l/o) «~ 1 /{^g(l/cr 2 ) + A ~ 2 } 

n + 1 

NWa2) *~ 2 N[1/a) + ^ - 2/^X^ + tr[(X T X){V m + ^x^}] 
produce summaries based on q*((3) ~ N(fi q ^, 7 S q ^) and 
g*(o- 2 ) ~ Inverse-Gamma(i(n + 1), (n + l)/{2/i g ( 1 / (j2 )}) 

until data no longer available or analysis terminated. 



original units. The hyperparameters were set at a\ = 10 10 and A = 10 5 to impose non- 
informativity. 

Note, for example, the approximate posterior density functions for f$2> the regression 
coefficient attached to age of household head. For n < 100 the posterior density 
function is relatively flat and 02 is not statistically significant. As n increases, the posterior 
density functions become narrower and, by n = 250, the lower limit of the 95% credible set 
is positive - indicating statistical significance of this predictor. Also, in the bottom row, the 
posterior mean of the error variance, a 2 , is seen to gradually move to the left as n increases 
- reflecting the increased precision as more data arrives. 

The right-most column of Figure [2] shows the batch MFVB posterior density functions 
for n = 250. In this case, the batch and online MFVB results are seen to be virtually 
identical. However, as demonstrated later, such agreement is not guaranteed in general. 

2.1.1 Batch-based Tuning and Convergence Diagnosis 

Ideally, the online Algorithm [2] will mimic the results of the batch Algorithm [T] as the sam- 
ple size n increases. However, we know of no guarantees that this will happen and it 
is possible that the online parameters will diverge from their batch counterparts. For 
the more elaborate models studied later in this article, such divergence is very common. 
Therefore, convergence diagnosis at the start of the online iterations is essential. The prin- 
cipal idea is to start by running a small subset of initial data points in the batch algorithm 
to obtain starting values for both data sufficient statistics and, more importantly, estimated 
parameters of the model. A second, small validation subset of data is used to compare the 
batch and online algorithm results. If convergence of the online iterations to their batch 
counterparts is not verified by this comparison then larger initial batch runs are required 
to tune the online algorithm. 

The idea of collecting some of the data into a small subset to improve performance of 
an algorithm where one new data point is introduced at each iteration is reminiscent of the 
"mini-batches" of Hoffman, Blei & Bach (2010). However, in our approach, the batching 
of data happens only with a small subset at the very beginning of the algorithm rather 
than throughout. Also, we develop an alternative tuning method for this subset batch size 



5 



n=50 n=100 n=150 n=200 n=250 



intercept 



number of 
pharmacy visits 



age of 
household head 



indicator 
of male 



indicator 
of married 



indicator of 
diploma education 



number of 
illnesses in 
the past year 



indicator of 
injury 



number of 
illness days 



number days of 
limited activity 



indicator of 
health insurance 



error variance 





1 — i — i — i — i — i — r n — i — i — i — i — i — r Ti i — r~i — r~r n rn — i — i — i — r T i i i — i n 

1.6 2.0 2.4 2.8 1.6 2.0 2.4 2.8 1.6 2.0 2.4 2.8 1.6 2.0 2.4 2.8 1.6 2.0 2.4 2.8 





-0.15 0.00 0.15 -0.15 0.00 0.15 -0.15 0.00 0.15 -0.15 0.00 0.15 -0.15 0.00 0.15 




"1 — I — T 



~i — r 



yv i vy 

— i — i — i i i — i — i — i 



0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 




-2-10 1 -2-10 1 -2-10 1 -2-10 1 -2-10 1 




I I I I T 

-0.04 -0.01 0.02 -0.04 -0.01 0.02 -0.04 -0.01 0.02 -0.04 -0.01 0.02 -0.04 -0.01 0.02 



i — r 



T — i — r 



-0.1 0.1 -0.1 0.1 -0.1 0.1 -0.1 0.1 -0.1 0.1 




Figure 2: Successive approximate posterior density functions of regression coefficients and the log- 
arithm of error variance for the Vietnam medical expenses data described in the text. The predictors 
corresponding to each regression coefficient are listed in the left-hand columns. The posterior den- 
sity functions are based on online MFVB as detailed in Algorithm^ The axis limits are the same 
across each row and a vertical line is positioned at zero. For n = 250 the batch MFVB approximate 
fits are shown as thick grey curves. 

below; notably, our tuning method requires batching only some initial subset of the data 
rather than the full data set. 

We will now provide details via the Figure [2] example. Figure [3] shows the posterior 
means and 95% credible sets for each j3j, < j < 11, and log(o^) and samples sizes 



6 



intercept 
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Figure 3: Convergence diagnostics for the example given in Figure^ The horizontal axes show the 
sample sizes between a warm-up batch sample of size n warm = 100 and validation sample sizes up 
to rivaiid = 100 greater than n war m- 

n = 100, 110, . . . , 200 when the Vietnam medical expenses data are fitted via both batch 
and online MFVB. The batch MFVB summary statistics (shown as grey lines in Figure [3]) 
correspond to simply inputting the first n warm = 100 observations into Algorithm [I] and 
then repeating this process for 10 additional equally-spaced sample sizes that are n va ijd = 
100 greater than n warm . The largest sample size is then n war m + n V alid = 200. The online 
results (shown as grey lines in Figure [3]l were obtained via online MFVB updating steps 
of Algorithm |2]but with p> q (i/ a 2), y T y, X T y, X T X and n initialized at the values obtained 
when the first n warm = 100 observations are inputted into Algorithm [l] This implies that 
all the results are identical at n = n warm = 100, but there are some small discrepancies for 
n > 100. In this example the discrepancies are negligible, and hard to discern from Figure 
[3]- indicating convergence of the online MFVB algorithm. Figure [5] in Section [3] shows an 
example where convergence is not achieved with n warm = 100 and a larger warm-up is 
required. 

Algorithm 2' is a modification of Algorithm [2] that incorporates batch-based tuning 
and convergence diagnostics. Whilst such modification it is not necessary for the example 
depicted in Figures [2] and |3j it is crucial for more sophisticated semiparametric models 
such as those described later in this article. 
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1. Set 

Wwarm to be the warm-up sample size and re V aiid to be size of the validation 
period. Read in the first n warm + n va i ic j response and predictor values. 

2. Create y waTm and X waim consisting of the first n warm response and predictor values. 

3. Feed y waTm and X warm into the batch MFVB Algorithm [I] to obtain a starting value 

Ior ^(1/<T 2 )- 

4. Set y y <— 2/warm^warm/ V ^~ warml/warm' X X <— X warm X waTm 

and n <- n warm . 

5. Run the online MFVB Algorithm [2] until n = n warm + n va [ i( x. 

6. Use convergence diagnostic graphics to assess whether the online parameters are 
converging to the batch parameters. 

(a) If not converging then return to Step 1 and increase n warm . 

(b) If converging then continue running the online MFVB Algorithm [2] until data 
no longer available or analysis terminated. 



Algorithm 2': Modification of Algorithm^to include batch-based tuning and convergence diag- 
nosis. 

One could contemplate automating Step 6 of Algorithm 2', to save the user from having 
to conduct diagnostic checks. However, we have not yet explored automatic convergence 
diagnosis and, instead, flag this as a problem worthy of future research. 

2.1.2 Model assumptions 

The online MFVB Algorithms 2' is founded upon the same assumptions as its batch coun- 
terpart Algorithm [T] Both algorithms fit the Bayesian linear regression model (|TJ, but the 
latter has the option to do the fitting in real time for sequentially arriving data. 

Throughout this article, we are not allowing for the model parameters to change as 
new data arrive. Colloquially, we assume 'fixed targets" rather than "moving targets". 
Extensions to semiparametric regression scenarios where the model parameters drift over 
time, and real-time algorithms that adapt to such drifts, is certainly worthy of future in- 
vestigation - but beyond this article's scope. 

2.2 Linear Mixed Models 

A very useful structure for semiparametric regression is the class of Bayesian linear mixed 
models of the form 

y\ (3, u, a 2 £ ~ N(Xf3 + Zu. a 2 £ I) 

(4) 

• • • > o"«r ~ A r (0,blockdiag(^ 1 I Kl ,..., o\ T I Kr )) 

where y is an n x 1 vector of response variables, j3 is a p x 1 vector of fixed effects, u 
is a vector of random effects, X and Z corresponding design matrices, a 2 is the error 
variance and a\ x , . . . , a 2 r are variance parameters corresponding to sub-blocks of u of size 
K\, . . . , K r . Here the priors are taken to be 

(3 ~ N(0, all), ~ Half-CauchyO^), 1 < I < r, a e ~ Half-Cauchy(A £ ) (5) 
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with the hyperparameters satisfying a^,A e ,A u £ > for 1 < £ < r. As in Section |2j 
tractability considerations motivate the introduction of auxiliary variables 



a u i ~ Inverse-Gamma (|, and a £ ~ Inverse-Gamma 1/A^) (6) 

and use of the analogue of ^ to induce Half-Cauchy priors on the standard deviation 
parameters. 

As spelt out in Section 2 of Zhao, Staudenmayer, Coull & Wand (2006), model (|4j) — 
encompasses a rich class of models including (with example number from Zhao et al. 2006 
added): 

• simple random effects models (Examples 1 and 2), 

• cross random effects models (Example 3), 

• nested random effects models (Example 4), 

• generalized additive models (Example 6), 

• semiparametric mixed models (Example 7), 

• bivariate smoothing and geoadditive models extensions (Example 8). 

Examples 2 and 6 of Zhao et al. (2006) actually involve 2x2 and 3x3 unstructured co- 
variance matrix parameters which are not covered by ^ and Algorithm [3] However, as 



discussed in Section 2.3 the unstructured covariance matrix extension is quite straightfor- 
ward. 

Algorithm [3] describes online MFVB fitting of the linear mixed model with prod- 

uct restriction 

) = q((3,u,a u i, . . . ,a ur ,a £ )q(a ul , . . .,a ur ,a e ). 

Batch MFVB fitting of (Q, but with slightly different prior distributions, is given by Algo- 
rithm 3 of Ormerod & Wand (2010), where the notation 

C = [X Z] 

is used. Let P be the number of columns in C. Then each pass of the corresponding online 
MFVB algorithm involves arrival and processing of a new scalar response measurement, 
y new , and a P x 1, Cn ew , corresponding to the new row of C. Algorithm [3] facilitates online 
fitting of via natural modification of the aforementioned batch MFVB algorithm. 

The Cn ew vector will have different forms depending on the type of linear mixed model. 
To better understand the nature these forms, consider the following two special cases of 

©= 

VijlPo, Ui, Pi ~ N(p + Ui + pi Xij,al), l<i<m, 1 < j < m, 

Ui\al^N(0,all), fa/h ~ JV(0,oJ), (7) 

a u ~ Half-Cauchy(yl u ), a £ ~ Half-Cauchy (A e ) 

and 

/ K s K t \ 

■jji\Po, Ps, Pt,u s , u t ~ N I /3 + p s Si + fa U + X] Us ' k + X Ut > k 4(**)> a l ' 



\ k=l k=l / 

1 < i < n, u s = [u S)1 ■ ■ ■ u s , Ks ] T , u t = [it ti i • • • u t ,K t ] T , 
u s \al s ~N(0,al tS I), u t \ a\ t ~ JV(0, a\ t I) , 

Po, Ps,Pt ~ N(0, cr|), cr niS ~ Half-Cauchy (A UjS ), a ufi ~ Half-Cauchy (A U)t 

a £ ~ Half-Cauchy(^4 e ). 



(8) 
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Algorithm 3 Online mean field variational Bayes algorithm for approximate inference in the Gaus- 
sian response linear mixed model Q. 

1. Perform batch-based tuning runs analogous to those described in Algorithm 2' and 
determine a warm-up sample size n war m for which convergence is validated. 

2. Set y warm and C warm to be the response vector and design matrix based on the first 

Kwami observations. Then set y^y i — ^/warml/warm' y i — Cwarm2/warm' 

c'c «- 

^ warm ^ warm/ n i n warm- 

Also, set Mg(i/<r|) to be the value for this quantity obtained 
in the batch-based tuning run with sample size n war m- 

3. Cycle: 

read in y new (1 x 1) and c new (Pxl) ; n <- n + 1 

y T y <- y T y + y n 2 ew ; C T y <- C T y + c new y new ; C T C <- C T C + c new cj ew 

Mg(i/o|) C T C + blockdiag{^ 2 I p , ^ (1/o a jl ^ , . . . , /^(l/^)-^} 



■•gOa,«) 



Pq{P,u) <~ M ? (l/<j|) S g(^,it) C 2/ ; Mg(l/a e ) ^ l/{A t 9 (l/ CT |) + A } 

n + 1 

/i<?(V<T?) *~ 2 Wae) + - ^~d^y + tr[(C T C){S g(/3iU) + /t^g^^}] 
For £ = 1, . . . , r : 

M 3 (l/a u «) <~ 1 /{Vq(l/al e )+ A ui} 

K e + l 

+ || 2 +tr(S ?K) ) 

until data no longer available or analysis terminated. 
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Here and throught ~ denotes "distributed independently". 

Model ^ is the random intercept extension of simple linear regression for longitudinal 
data with denoting the jth predictor /response pair for the ith group, with m 

denoting the number of groups. There is no intrinsic reason to insist that the observations 
arrive in order with respect to the i, j subscripting. Hence c new will have the form 



C nL , 



1 

'-'new 



Here x new is the new predictor measurement that partners y nevi and e new is a m x 1 vector 
with an entry of 1 in position i ne „, corresponding to the group that (x nevf , y new ) is from, and 
zeroes elsewhere. 

Model |8| is a mixed model-based penalized spline version of the additive model 

Hi = Pa + fs(si) + ft(U) + £j, 1 < i < n, 

where the Sj and U are continuous predictor measurements and f s and ft are smooth func- 
tions. The functions z%.(-), 1 < k < K s , are spline basis functions. A simple example is the 
truncated line basis 

4(s) = (s- Kk ) + (9) 

where k\ , . . . , k s Ks are a set of knots within the the domain of the Sj values. More so- 
phisticated, and numerically stable, options for Zk{s) are described in, for example, Wood 
(2006), Welham et al. (2007) and Wand & Ormerod (2008). We use the last of these, known 
as O'Sullivan splines, in our examples. The z\,{-),l <k < K t , are defined similarly. A key 
feature of the and z|(-) is that the multiple-of-diagonal covariance matrices are appro- 
priate under mixed model representations of penalized splines. This subtlety is explained 
in Section 4 of Wand & Ormerod (2008). 

Online fitting of ([Ej involves reading in vectors of the form 

Qiew [l-i ^new; ^new; -^l(^new)? • * • ; ^fCg ( ^new ) ; ^l(^new); • • • ; ^/C/ C^'new). 

where s new and i new are the new predictor measurements that partner y new . There is, how- 
ever, the issue of having to set the spline basis functions in advance. For instance, if the 
truncated line basis Q is used then the knots have to be set at or near the start of the algo- 
rithm. For many applications this is not a major problem. For example, if the s new values 
correspond to age, in years, of human adults then the range of possible Si values is easy to 
specify and a reasonable spline basis can be set in advance. In a similar vein, for longitu- 
dinal data, Algorithm|3]assumes that the number of groups is set in advance. If the groups 
correspond to the counties of a geographical entity then this should not pose a problem. If 
the data are from a medical study then Algorithm [3] assumes that the number of patients 
and their identity numbers are fixed in advance. If this is not a reasonable assumption 
then some adjustment is required. 

Finally, we mention the possibility of speeding up the most expensive update: 



S ?(/3,u) 



^(i/ CT §) C T C + blockdiag-^ 2 I p , fi q ( 1/a ajI Kl n q(1/a 2 r) I Kr } . (10) 



For Model (|7j| the dimension of the matrix requiring inversion has dimension (2 + m) x 
(2 + m). If the number of groups is high then naive implementation could lead to a bot- 
tleneck at (fToh. In the batch case it is well-known (e.g. Smith & Wand, 2008) that C T C 



contains diagonal forms that allow 0{m) computation of the right-hand side of ( 10 1. Such 
efficiencies are available in the online case, but require careful rearrangement of the entries 
of C T C during the updates. 
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2.3 Extension to Unstructured Covariance Matrices for Random Effects 

A random intercepts and slopes extension of |7|| is one with the first two hierarchical levels 
set to 

VijlfaUi,?! ~ N(Po + Ui + (J3i + Vdxij,<T*), l<i<m, l<j<m, 



and 



Ui 
Vi 



~iV(0,E), where £ = 



2 

&u Puv @u @v 

2 

Puv &U @V &v 



is an unstructured 2x2 covariance matrix. The conjugate prior for E is the Inverse Wishart 
distribution. However, the specification 



El a uv \,a UV 2 ~ Inverse-Wishart ( v + 1, 2v 



1/dv.vX 

l/a uv2 

a U vi , a uv2 ~ Inverse-Gamma (\ , 1/ A w ) , i/, A ni , > 

provides a covariance matrix extension of a u ~ Half-Cauchy(A M ). The choice v = 2 is 
particularly attractive since it imposes a Uniform(— 1, 1) distribution on p uv and Half-*2 
distributions on a u and a v . This is laid out in Huang & Wand (2012), including the defini- 
tion of the Inverse-Wishart (a, B) distribution. 

Extensions to more sophisticated models, possibly having larger unstructured covari- 
ance matrices, can be done in a similar fashion. 

2.4 Extension to Sparse Shrinkage Penalties 

Model (Q involves the following Gaussian penalization on sub-vectors of u: 

u e \a 2 ue ~ N(0,a 2 ui I), 1 < £ < r. (11) 

However, many models of current-day interest, such as wide data ("p 3> n") and wavelet 
regression, require an assumption that the regression coefficients are sparse. Under such 
sparseness assumptions, the Gaussian priors flT) are not appropriate since they induce 
a relatively gentle amount of penalization that lacks the ability to annihilate regression 
coefficients during fitting and inference. 

For simplicity of exposition we will confine discussion of the sparse shrinkage exten- 
sion to the r = 1 version of (Q. Hence we retain 

y\(3,u, a\ ~ N(X{3 + Z u, a 2 1) 

without any sub-division of u. Let K be the dimension of u and consider general mutually 
independent prior penalization's of the form: 

u k ~ p(u;a u ,0) 

where p(-;a u ,0) is a density function with scale parameter a u and shape parameter 6. 
Options for p(u; 1, 0) include: 

p(u; 1, w) = w{\ exp(— + (1 — w) Sq(u) (Laplace-Zero), 

p(u; 1) = {2TT 3 )- 1 / 2 exp(u 2 /2)£i(u 2 /2) (Horseshoe), 

A2 A r(A+ — ) (12) 
p(u;l,X) = — 1/2 2 exp(w 2 /4) D-2\-i(\u\) (Normal-Exponential-Gamma) 

and p(u; A) = — -. - - . (Generalized Double Pareto). 

2(1 + |n|/A) A+1 
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Here 5q denotes the Dirac delta function with mass at zero, E\ denotes the exponential in- 
tegral function of order 1 and D v denotes the parabolic cylinder function of order v accord- 
ing to the definitions of Gradshteyn & Ryzhik (1994). References for the development of 
these sparse shrinkage priors are Johnstone & Silverman (2005) (Laplace-Zero), Carvalho, 
Poison & Scott (2010) (Horseshoe), Griffin & Brown (2011) (Normal-Exponential-Gamma) 
and Armagan, Dunson & Lee (2012) (Generalized Double Pareto). 

Batch MFVB algorithms for the priors have recently been derived by Wand & 
Ormerod (2011) (Laplace-Zero prior) and Neville, Ormerod & Wand (2012) (Horseshoe, 
Normal-Exponential-Gamma and Generalized Double Pareto priors). 

Algorithm [4] is the online adaptation of Algorithm 4 of Wand & Ormerod (2011) for the 
Laplace-Zero prior model: 

y\(3, 7 , v, a 2 ~ N(X(3 + Z( 7 v), a*I), v\a 2 u ,b~ N{0, a 2 u diag(b)- 1 ), 
a u I a u ~ Inverse-Gamma (| , l/a u ) , a 2 \ a £ ~ Inverse-Gamma ( \ , 1 /a e ) , 

(13) 

(3 ~ N(0,agl), a u ~ Inverse-Gamma(i, a £ ~ Inverse-Gamma^, l/A 2 ), 



bk ~ Inverse-Gamma(l, \), 7&| p ~ Bernoulli (p), p ~ Beta(^4 p , B 



Note that AO B denotes the element-wise product of matrices A and B having the same 
dimensions. Model p3| is a reproduction of (30) in Wand & Ormerod (2011) and the addi- 
tional notation is explained there. Note, in particular, that the Laplace-Zero prior is han- 
dled via the introduction of auxiliary variables b, 7 and v. Section 3.6 of Wand & Ormerod 
(2011) provides the necessary details. Similar online MFVB algorithms for the continuous 
sparse signal shrinkage priors listed in p2| follow from the batch MFVB algorithms of 
Neville, Ormerod & Wand (2012). 



As with the spline-based semiparametric regression models described in Section 2.2 
the wavelet-based models described here benefit from the low-rank property laid out in 
Section 3.1 of Wand & Ormerod (2011). This property entails that the basis functions are 
fixed once and for all during the warm-up period. This permits fast updating of wavelet 
nonparametric fits as new data arrive. A cost of this approach is that the domain of pre- 



dictors needs to be specified based on the warm-up data. As explained in Section 2.2 this 
will often be reasonable. Of course, there is always the possibility of new predictor values 
landing outside domain of the basis functions, in which case some modification may be 
necessary. 

Figure [4] illustrates online wavelet nonparametric regression for data generated to 

x new ~ Uniform(0,l), y new | x new ~ iV( /wo 1) 

where / wo is defined by (20) of Wand & Ormerod (2011). The warm-up sample size is 
«warm = 300. The desired improvement in the estimate of f w0 as n increases is clearly 
apparent. Convergence to the batch MFVB estimate was found to be excellent in this case. 



3 Binary Response Models 

The binary response model we consider here takes the same form as ^ and (|5j, but with 
a £ removed and 

y\/3, u ~ Bernoulli{logit _1 (X /3 + Zu)}. (14) 

Note that ( |l4") | is a convenient shorthand for the entries of y, conditional on (f3, u), being 
independent and with ith entry Bernoulli [logit _1 {(X (3 + Z u)i}]. 

Batch MFVB algorithms for approximate inference in p4| , Q and ^ start with the 
product restriction 

p(f3,u,a u i, . . .,aur,<rli, ■ ■ -^Ir) ~ q((3,u,a u i, . . .,a ur )q(al 1 , . . . ,a£ r ). 
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Algorithm 4 Mean field variational Bayes algorithm for the determination of the optimal parame- 
ters in q* ((3, v), q* (7), q* (<r^) and q* (a^)for the Bayesian sparse signal regression model (13 >. 



1. Perform batch-based tuning runs analogous to those described in Algorithm 2' and 
determine a warm-up sample size ra warm for which convergence is validated. The 
batch MFVB algorithm is Algorithm 4 of Wand & Ormerod (2011). 



2. Set y warm and C war m — 
based on the first n warm 



warm^warm/ fl 



[1 Z warm ] to be the response vector and design matrix 
observations. Then set y T y <- Z/warm2/warm/ Z T 1 <- 

C T C <- 



c 



\D/ Z Z i Z waim Z waTm , 
rm . Set K to be the number of columns in Z 



C 1 y <- C. 



warm!/warm' 

T V 

warm y warm' 



^(1/(72)^(1/0-2), flq(l/ag), Vq(l/a £ ), Vq(l/a v )> Vq(b)> Vq(w y ) 



and fi 



g(to-y) 



Also, set 
to be the values 



warm- 



for these quantities obtained in the batch-based tuning run with sample size n v 
3. Cycle: 

read in y new (1 x 1) and z new (if X 1) ; n 4- n + 1 ; c new <- [1 jz new ] 



y 1 y <- y + «L ; <- zT i + -z new ; z T y <- z T y + z ne w z/w 



Z T Z <- Z T Z + z 



ew Vnew / 

C'C 4-C 1 C + 



'new 



M ? (l/o-2)(C C) © ^(""M + 







/i,(i/o3)diag(fig W ) 



-1 



/V/3>«) <- Mg(i/o-|)5] (? (/3,«)diag{/x g(li ,^)}C T y 

<~ [^(i/a2){diagonal(S gW ) + /x2 (i;) }]- 1 /2 



ft 



r/ g(7) «- -i/i g(1/(7 2 ) ^diagonal(Z T Z)0{cr2 H + / x2 (i)) }_2(Z T y)0/i 9W 

+ 2(Z T 1) { [S,^) )] i= l )2 <j<ifH-l + V>q(fi)Vq(v)} 

+2 diagonal{Z T Z diag(/i, (7) )£ 9( „)} 

-2 diagonal(Z T Z) // ? ^ diagona^S^)) 

+ 2 /^(„) © {^-^(^(7) Vq(v)) ~ diagonal(Z T Z) /x g(7) fx q[v) } 

+ 1p(A p + /i,( 7 .)) - 1p(B p + K~ /ig( 7 .)) 



exp(7 ?(?(7) ) _ 



1 

. /*?(7) 



•q{w y ) <~ diag{/x 9(lu ^) (1 - ^ ? (^))} + /V™-?) ^*g(to 7 ) 

Mg(l/o e ) 1 /{M«(1/<t|) +^7 2 } ; Mg(l/o„) <~ !/ {^(l/crg) + -4^ 2 } 
M 3 (l/a e ) + I 2/ T y - (/*«(w-r) C T 2/ 



+ 2 tr (C T C fi <?(™ T ) { S <z(/3,t>) + /V/3>w)^9(/3,w)} 
M,(i/o„) + ^5f>)^ dia S onal ( S 9(«)) + /*,(«)} 

X (K + l)/B q(a 2 u) ; llq(l/al) <~ \{ n + l )/ B q{*1) 



until data no longer available or analysis terminated 
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Figure 4: Examples of online MFVB wavelet fits based on Algorithm^ The true regression curve 
is the function f wo defined in Wand & Ormerod (2011). 



The resultant updates for the a^ £ and a U £ are the same as in the Gaussian response case. 
The optimal (/-density for (J3, u) satisfies 

9* (/3, u) oc exp L T (X(3 + Zu) - l T log(l + e Xf3+Zu ) - ^ ||/3|| 2 ~ IZXd/^) \\M 2 ) 

I U P £=1 J 

(15) 

However, this is a non-standard form and poses tractability problems with regards to ap- 
proximate inference for (/3, u). A reasonable remedy is to replace ( |l5") by a member of the 
following family of Multivariate Normal approximations: 

q*(/3,u) ~ N(n q{/3:U .^,'Sg {f3;U .^) 
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where 

^g(/3,u) = 2C T diag{A(^)} C + blockdiag^ 2 I p , p^y^I^ fi q{1/a 2 r) I Kr } 
£ is an n x 1 vector of positive variational parameters, X(x) = tanh(x/2) /(4 x), and 

with C = [X Z] as before. This family of approximations is due to Jaakkola & Jordan 
(2000) and their genesis is given there. Section 3.1 of Ormerod & Wand (2010) explains this 
approximation strategy using notation similar to that used here. Jaakkola & Jordan (2000) 
also present an Expectation-Maximization argument that results in 



diagonal[C{Sg (/ 3 iU . 4)) + M g (/3, u; £) ^, u; £)} C 1 

being the optimal update for the £ vector. Algorithm [5] is the online MFVB algorithm that 
arises from appropriately modifying the batch MFVB algorithm for ( |T4] | with the Jaakkola 
& Jordan (2000) strategy. 

An alternative route to an online MFVB algorithm for binary response linear mixed 
models involves the probit link and the Albert & Chib (1993) auxiliary variable strategy. 
Batch MFVB algorithms for models of this general type have been developed by Girolami 
& Rogers (2006) and Consonni & Marin (2007). Modification of these algorithms for the 
probit link version of |l4| | should lead to an algorithm that performs online approximate 
inference similar to that performed by Algorithm [5] 

Figure |5]performs batch-based convergence diagnostics for a binary response nonpara- 
metric regression example. This is a special case of ( p~4| ) with r = 1 and Z containing spline 
basis functions. New predictor/response pairs (x new , y new ) were generated according to 

x ne „ ~ Uniform(0, 1), y n ew|x new ~ Bemoulli(logit _1 (cos(47rx new ) + 2x new - 1)). (16) 

The analogues of Steps 1.-5. of Algorithm 2' were applied with an initial trial involving 
"warm = 100 and n va iid = 100. The Bayes estimates and 95% credible sets of the logit- 
transformed mean function at each of quartiles of the x-values, as well as log(<r^), are 
shown in the upper row of Figure [5] However, they have noticeable disagreement, which 
indicates non-convergence of the online MFVB results to their batch counterparts and that 
"warm should be increased. Setting n warm = 300 leads to the more concordant results 
shown in the lower row of Figure |5j indicating adequacy of this warm-up size. We have 
found this behaviour typical for binary response online MFVB and this simple example 
demonstrates the importance of batch-based tuning and convergence diagnostics. 



4 Live Internet Demonstrations 



We have launched the web-site: realtime-semiparametric-regression.net for 
displaying live real-time semiparametric regression analyses. Links on this web-site point 
to several examples, and we anticipate that the set of examples will grow during the next 
few years. At the time of this writing, the examples involve two types of real-time data: 
stock prices from the U.S. National Association of Securities Dealers Automated Quotations 
(NASDAQ) and the London Stock Exchange in the United Kingdom, and features of prop- 
erty rentals in Sydney, Australia. 



4.1 Stock price data 

In this set of examples, the predictor and response variable pairs correspond to pairs of 
stock prices. An example nonparametric regression model is 

(Microsoft stock price)j| /3, u, a\ ~' N(/3q + /(( Intel stock price ) j), af) (17) 
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Algorithm 5 Online mean field variational Bayes algorithm for approximate inference in the bi- 



nary response logistic mixed model (141 



1. Perform batch-based tuning runs analogous to those described in Algorithm 2' and 
determine a warm-up sample size n warm for which convergence is validated. 

2. Set y warm and C waim to be the response vector and design matrix, and 
£warm to De tne vector of variational parameters, based on the first 
"warm observations. Then set C T {y - \l) <- C^ arm (y warm - 
C T diag{A(£)}C <- C^ arm diag{A(£ warm )}C war m, n <- n warm . Also, set 

S 9(/3,«;€)' / i ff(i/«^ 1 )'"-' /i 9( 1 /«^r) to be the values for these quantities 
obtained in the batch-based tuning run with sample size n warm . 

3. Cycle: 

read in y new (1 x 1) and c new (Pxl) ; n <- n + 1 



C T (</ - |1) <- C T (y - |l) + (y new - i) 
C T diag{A(£)}C <- C r diag{A(£)}C + A(£) c ne w c n T ew 

S g(/3,™) <- [2C T diag{A(£)} C + blockdiag{<r^ 2 I p , n^y^I^, . . . , fi q(1/a 2 r) I Kr } 

Vq(f3,u) *~ S g(/3,u) C T (y - |l) 

For I = 1, . . . , r : 



-i 



g/ + l 

2 „ q(lM + \\„ qiue) \\* + tr(S ?M ) 
until data no longer available or analysis terminated. 
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Figure 5: Convergence diagnostics for a binary response nonparametric regression example with 
data generated according to ( |l6] >. The solid lines track the posterior means, whilst the dashed lines 
show corresponding 95% credible sets. First row: the horizontal axes show the sample sizes between 
a warm-up batch sample of size n warm = 100 and validation sample sizes up to n m n^ = 100 greater 
than n war m- Second row: as for the first row, but with n warm = 300. 



where f(x) = j3\x + Ylk=i Uk z k{ x ) is a penalized spline function as described in Section 



2.2 with the same distribitutional structures imposed on the model parameters. In ad- 
dition, (Microsoft stock price); and (Intel stock price); denote the ith stock 
price for the U.S. companies Microsoft Corporation and Intel Corporation, respectively, for the 



current trading day. The web-site displays fitting of (17 1 in real-time during the NASDAQ 
opening hours (9:30am to 4:00pm North American Eastern Standard Time). The R pack- 
age quantmod (Ryan, 2012) is used to obtain the NASDAQ data from the Yahoo! Finance 
web-site (finance . yahoo . com). 

A similar series of examples is set up using London Stock Exchange data during stock 
market opening hours (8:00 am to 4:20 pm Greenwich Mean Time). Note that Yahoo! Fi- 
nance delays London Stock Exchange data by 20 minutes. 

Depending on the example and the live data-set, the appropriateness of the nonpara- 



metric regression model (17 1 could be questionable and more sophisticated models enter- 
tained. Hence, these examples should only be viewed as simple illustrations of the concept 
of real-time semiparametric regression. 



4.2 Sydney property rental data 

This example involves real-time semiparametric regression analysis of data from the prop- 
erty rental market in Sydney, Australia. Each day, hundreds of properties come on the 
Sydney market and these fresh data are usually advertised on rental agency web-sites and 
real estate web-sites as realestate . com . au. This offers the possibility to perform real- 
time analysis and produce live and up-to-date summaries of the rental market status. An 
attractive approach to model such data is the special case of semiparametric regression 
known as geoadditive models (Kammann and Wand, 2003). Explicitly, we work with the 
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model 



log((weekly rent)y)| (3, Ui, u 2 , u 3 , « 4 , u 5 , a 2 £ ~ 

N((3q + j3\ houses + f%{ (number of bedrooms ) 

+/3( (number of bathrooms) y) + / 4 ( (number of car spaces) y) 
+/5(longitudejj, latitude^) + C/j, <rf), 



(18) 



f^l) • • • ) ^992 | Cj 




N(0,al). 



Here, (weekly rent)^ is the weekly rental amount in Australian dollars of the jth prop- 
erty for the ith real estate agency (hereafter called the (i, j)th property), and house^ is 
an indicator of the (i, j)th property being a house, townhouse or villa (rather than an 
apartment). The variable (number of bedrooms) ij is the number of bedrooms in the 
(i,j')th property. Variables concerning the numbers of bathrooms and car spaces are de- 
fined similarly. The geographical location of the (i, j)th property is conveyed by the vari- 
ables longitude^- and latitude^. The Ui, 1 < i < 992, are random intercepts for each 
of the 992 agencies. The fixed effect regression coefficients (3q, (3± and the linear contri- 
bution to /2, . . . , /s are stored in f3. Similarly, the spline basis coefficients for f 2 , . . . , are 
stored in u?, . . . , u$. The estimate of /s is based on bivariate thin plate splines as explained 
in Chapter 13 of Ruppert, Wand & Carroll (2003). 

The web-site for this example displays fitting of ( |T8") | in real time based on data col- 
lected since 9th May, 2012. A number of regression summaries are presented. Firstly, a 
geographical map is listed with processed properties as small black dots and recently (i.e. 
during the last hour) added ones as yellow circles. The total number of processed prop- 
erties is included at the bottom right. Next, a color-coded geographical map displays the 
weekly rent for a two bedroom apartment with one bathroom and one car space for var- 
ious geograhical locations. The approximate posterior density function for j3% shows the 
impact of the property being a house or not. Regression fits and 95% credible sets for the 
number of bedrooms, bathrooms and car spaces for apartments are presented. Finally, a 
list of rental agencies with the least and most expensive properties, after correcting for all 
other covariates, is provided. All these regression summaries are computed in real time 
and the figures are updated every hour. 
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